{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "0",
   "metadata": {},
   "source": [
    "# Backtest (high-level API)\n",
    "\n",
    "Tutorial for [NautilusTrader](https://nautilustrader.io/docs/) a high-performance algorithmic trading platform and event driven backtester.\n",
    "\n",
    "[View source on GitHub](https://github.com/nautechsystems/nautilus_trader/blob/develop/docs/getting_started/backtest_high_level.ipynb)."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1",
   "metadata": {},
   "source": [
    "## Overview\n",
    "\n",
    "This tutorial walks through how to use a `BacktestNode` to backtest a simple EMA cross strategy\n",
    "on a simulated FX ECN venue using historical quote tick data.\n",
    "\n",
    "The following points will be covered:\n",
    "- How to load raw data (external to Nautilus) into the data catalog\n",
    "- How to set up configuration objects for a `BacktestNode`\n",
    "- How to run backtests with a  `BacktestNode`"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "2",
   "metadata": {},
   "source": [
    "## Prerequisites\n",
    "- Python 3.11+ installed\n",
    "- [JupyterLab](https://jupyter.org/) or similar installed (`pip install -U jupyterlab`)\n",
    "- [NautilusTrader](https://pypi.org/project/nautilus_trader/) latest release installed (`pip install -U nautilus_trader`)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "3",
   "metadata": {},
   "source": [
    "## Imports\n",
    "\n",
    "We'll start with all of our imports for the remainder of this tutorial."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "4",
   "metadata": {},
   "outputs": [],
   "source": [
    "import shutil\n",
    "from decimal import Decimal\n",
    "from pathlib import Path\n",
    "\n",
    "import pandas as pd\n",
    "\n",
    "from nautilus_trader.backtest.node import BacktestDataConfig\n",
    "from nautilus_trader.backtest.node import BacktestEngineConfig\n",
    "from nautilus_trader.backtest.node import BacktestNode\n",
    "from nautilus_trader.backtest.node import BacktestRunConfig\n",
    "from nautilus_trader.backtest.node import BacktestVenueConfig\n",
    "from nautilus_trader.config import ImportableStrategyConfig\n",
    "from nautilus_trader.core.datetime import dt_to_unix_nanos\n",
    "from nautilus_trader.model import QuoteTick\n",
    "from nautilus_trader.persistence.catalog import ParquetDataCatalog\n",
    "from nautilus_trader.persistence.wranglers import QuoteTickDataWrangler\n",
    "from nautilus_trader.test_kit.providers import CSVTickDataLoader\n",
    "from nautilus_trader.test_kit.providers import TestInstrumentProvider"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5",
   "metadata": {},
   "source": [
    "As a once off before we start the notebook - we need to download some sample data for backtesting.\n",
    "\n",
    "For this example we will use FX data from `histdata.com`. Simply go to https://www.histdata.com/download-free-forex-historical-data/?/ascii/tick-data-quotes/ and select an FX pair, then select one or more months of data to download.\n",
    "\n",
    "Example of dowloaded files:\n",
    "\n",
    "* `DAT_ASCII_EURUSD_T_202410.csv` (EUR\\USD data for month 2024-10)\n",
    "* `DAT_ASCII_EURUSD_T_202411.csv` (EUR\\USD data for month 2024-11)\n",
    "\n",
    "Once you have downloaded the data:\n",
    "\n",
    "1. copy files like above into one folder - for example: `~/Downloads/Data/` (by default, it will use the users `Downloads/Data/` directory.)\n",
    "2. set the variable `DATA_DIR` below to the directory containing the data. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6",
   "metadata": {},
   "outputs": [],
   "source": [
    "DATA_DIR = \"~/Downloads/Data/\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "7",
   "metadata": {},
   "outputs": [],
   "source": [
    "path = Path(DATA_DIR).expanduser()\n",
    "raw_files = list(path.iterdir())\n",
    "assert raw_files, f\"Unable to find any histdata files in directory {path}\"\n",
    "raw_files"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "8",
   "metadata": {},
   "source": [
    "## Loading data into the Parquet data catalog\n",
    "\n",
    "The FX data from `histdata` is stored in CSV/text format, with fields `timestamp, bid_price, ask_price`.\n",
    "Firstly, we need to load this raw data into a `pandas.DataFrame` which has a compatible schema for Nautilus quotes.\n",
    "\n",
    "Then we can create Nautilus `QuoteTick` objects by processing the DataFrame with a `QuoteTickDataWrangler`.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Here we just take the first data file found and load into a pandas DataFrame\n",
    "df = CSVTickDataLoader.load(\n",
    "    file_path=raw_files[0],                                   # Input 1st CSV file\n",
    "    index_col=0,                                              # Use 1st column in data as index for dataframe\n",
    "    header=None,                                              # There are no column names in CSV files\n",
    "    names=[\"timestamp\", \"bid_price\", \"ask_price\", \"volume\"],  # Specify names to individual columns\n",
    "    usecols=[\"timestamp\", \"bid_price\", \"ask_price\"],          # Read only these columns from CSV file into dataframe\n",
    "    parse_dates=[\"timestamp\"],                                # Specify columns containing date/time\n",
    "    date_format=\"%Y%m%d %H%M%S%f\",                            # Format for parsing datetime\n",
    ")\n",
    "\n",
    "# Let's make sure data are sorted by timestamp\n",
    "df = df.sort_index()\n",
    "\n",
    "# Preview of loaded dataframe\n",
    "df.head(2)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "10",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Process quotes using a wrangler\n",
    "EURUSD = TestInstrumentProvider.default_fx_ccy(\"EUR/USD\")\n",
    "wrangler = QuoteTickDataWrangler(EURUSD)\n",
    "\n",
    "ticks = wrangler.process(df)\n",
    "\n",
    "# Preview: see first 2 ticks\n",
    "ticks[0:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "11",
   "metadata": {},
   "source": [
    "See the [Loading data](../concepts/data) guide for further details.\n",
    "\n",
    "Next, we simply instantiate a `ParquetDataCatalog` (passing in a directory where to store the data, by default we will just use the current directory).\n",
    "We can then write the instrument and tick data to the catalog, it should only take a couple of minutes to load the data (depending on how many months)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "12",
   "metadata": {},
   "outputs": [],
   "source": [
    "CATALOG_PATH = Path.cwd() / \"catalog\"\n",
    "\n",
    "# Clear if it already exists, then create fresh\n",
    "if CATALOG_PATH.exists():\n",
    "    shutil.rmtree(CATALOG_PATH)\n",
    "CATALOG_PATH.mkdir(parents=True)\n",
    "\n",
    "# Create a catalog instance\n",
    "catalog = ParquetDataCatalog(CATALOG_PATH)\n",
    "\n",
    "# Write instrument to the catalog\n",
    "catalog.write_data([EURUSD])\n",
    "\n",
    "# Write ticks to catalog\n",
    "catalog.write_data(ticks)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "13",
   "metadata": {},
   "source": [
    "## Using the Data Catalog \n",
    "\n",
    "Once data has been loaded into the catalog, the `catalog` instance can be used for loading data for backtests, or simply for research purposes. \n",
    "It contains various methods to pull data from the catalog, such as `.instruments(...)` and `quote_ticks(...)` (shown below)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "14",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get list of all instruments in catalog\n",
    "catalog.instruments()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "15",
   "metadata": {},
   "outputs": [],
   "source": [
    "# See 1st instrument from catalog\n",
    "instrument = catalog.instruments()[0]\n",
    "instrument"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "16",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Query quote-ticks from catalog\n",
    "start = dt_to_unix_nanos(pd.Timestamp(\"2024-10-01\", tz=\"UTC\"))\n",
    "end =  dt_to_unix_nanos(pd.Timestamp(\"2024-10-15\", tz=\"UTC\"))\n",
    "selected_quote_ticks = catalog.quote_ticks(instrument_ids=[EURUSD.id.value], start=start, end=end)\n",
    "\n",
    "# Preview first\n",
    "selected_quote_ticks[:2]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "17",
   "metadata": {},
   "source": [
    "## Add venues"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "18",
   "metadata": {},
   "outputs": [],
   "source": [
    "venue_configs = [\n",
    "    BacktestVenueConfig(\n",
    "        name=\"SIM\",\n",
    "        oms_type=\"HEDGING\",\n",
    "        account_type=\"MARGIN\",\n",
    "        base_currency=\"USD\",\n",
    "        starting_balances=[\"1_000_000 USD\"],\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "19",
   "metadata": {},
   "source": [
    "## Add data"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "20",
   "metadata": {},
   "outputs": [],
   "source": [
    "str(CATALOG_PATH)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "21",
   "metadata": {},
   "outputs": [],
   "source": [
    "data_configs = [\n",
    "    BacktestDataConfig(\n",
    "        catalog_path=str(CATALOG_PATH),\n",
    "        data_cls=QuoteTick,\n",
    "        instrument_id=instrument.id,\n",
    "        start_time=start,\n",
    "        end_time=end,\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "22",
   "metadata": {},
   "source": [
    "## Add strategies"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "23",
   "metadata": {},
   "outputs": [],
   "source": [
    "strategies = [\n",
    "    ImportableStrategyConfig(\n",
    "        strategy_path=\"nautilus_trader.examples.strategies.ema_cross:EMACross\",\n",
    "        config_path=\"nautilus_trader.examples.strategies.ema_cross:EMACrossConfig\",\n",
    "        config={\n",
    "            \"instrument_id\": instrument.id,\n",
    "            \"bar_type\": \"EUR/USD.SIM-15-MINUTE-BID-INTERNAL\",\n",
    "            \"fast_ema_period\": 10,\n",
    "            \"slow_ema_period\": 20,\n",
    "            \"trade_size\": Decimal(1_000_000),\n",
    "        },\n",
    "    ),\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "24",
   "metadata": {},
   "source": [
    "## Configure backtest\n",
    "\n",
    "Nautilus uses a `BacktestRunConfig` object, which enables backtest configuration in one place. It is a `Partialable` object (which means it can be configured in stages); the benefits of which are reduced boilerplate code when creating multiple backtest runs (for example when doing some sort of grid search over parameters)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "25",
   "metadata": {},
   "outputs": [],
   "source": [
    "config = BacktestRunConfig(\n",
    "    engine=BacktestEngineConfig(strategies=strategies),\n",
    "    data=data_configs,\n",
    "    venues=venue_configs,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "26",
   "metadata": {},
   "source": [
    "## Run backtest\n",
    "\n",
    "Now we can run the backtest node, which will simulate trading across the entire data stream."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "27",
   "metadata": {},
   "outputs": [],
   "source": [
    "node = BacktestNode(configs=[config])\n",
    "\n",
    "results = node.run()\n",
    "results"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.8"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
